Communication device and communication methods

ABSTRACT

Audio data transmitted from another communication device is received by a receiver, and the received audio data is output to a speaker of a television. Audio is generated based on the output audio data, and the audio is input to a microphone of a camera/microphone device. Audio data corresponding to the input audio is input to an echo cancellation processing unit. An amount of distortion of the input audio data with respect to the received audio data is detected by the echo cancellation processing unit. It is determined whether or not the amount of distortion of the audio data detected by a distortion detector exceeds an acceptable amount. When the amount of distortion of the audio data exceeds the acceptable amount, notification data for presenting to the user a request for change of an output condition of the audio in the television is produced by a notification signal producer.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a communication device capable of transmitting and receiving audio data as well as video data and a communication method.

2. Description of the Background Art

In a communication system and a data communication method described in JP 2010-521856 A, communication is performed between a first terminal and a second terminal through a network. In the communication system, each of the first and second terminals includes a receiving circuit and a transmitting circuit. In addition, each of the first and second terminals includes a web camera and a microphone as input devices, and includes a display screen and a loudspeaker as output devices.

For example, video of a user of the first terminal is input to the web camera, and audio of the user of the first terminal is input to the microphone in the first terminal. In the second terminal, video of a user of the second terminal is input to the web camera, and audio of the user of the second terminal is input to the microphone.

In the transmitting circuit of the first terminal, data based on the video and audio input to the web camera and the microphone is transmitted to the second terminal through the network. In this case, the data transmitted from the first terminal through the network is received in the receiving circuit of the second terminal, and video and audio based on the received data are output from the display screen and the loudspeaker.

Similarly, in the transmitting circuit of the second terminal, data based on the video and audio input to the web camera and the microphone is transmitted to the first terminal through the network. In this case, the data transmitted from the second terminal through the network is received in the receiving circuit of the first terminal, and video and audio based on the received data are output from the display screen and the loudspeaker.

Thus, the user of the first terminal can talk with the user of the second terminal while visually recognizing the video of the user of the second terminal. Similarly, the user of the second terminal can talk with the user of the first terminal while visually recognizing the video of the user of the first terminal.

In the foregoing communication system, the audio based on the data transmitted from the second terminal to the first terminal is output from the loudspeaker of the first terminal, for example. At this time, part of the audio output from the loudspeaker of the first terminal might be input to the microphone of the first terminal due to room reverberation. In the transmitting circuit of the first terminal, the data based on the audio input from the microphone is transmitted to the second terminal through the network. Therefore, in the second terminal, the audio of the user of the second terminal is output from the loudspeaker every time he or she utters a word. In this case, the user of the second terminal feels a sense of strangeness in conversation with the user of the first terminal.

An echo canceller is known as a configuration for inhibiting the audio input to the microphone from being output from the loudspeaker in the second terminal as described above (see JP 2010-283483 A, for example).

In the echo canceller, acoustic echo to be input to the microphone of the first terminal is estimated based on the audio output from the loudspeaker of the first terminal. An audio signal based on the estimated acoustic echo is subtracted from an audio signal input to the loudspeaker of the first terminal based on the estimation result. This inhibits the acoustic echo from being produced from the loudspeaker of the second terminal.

Even when the configuration of the echo canceller is adopted, however, the audio output from the loudspeakers used in the first terminal and the second terminal may be distorted.

In addition, image processing and sound processing are performed on the communicated data in each terminal of the foregoing communication system. Therefore, delay occurs between a timing at which the data is input and a timing at which video and audio based on the input data are output in each terminal.

As described above, when the output audio is distorted or delays in the first terminal, an effect of the audio input to the first terminal remains in the audio output from the second terminal. In this case, the user of the second terminal feels a sense of strangeness in conversation. Similarly, when the output audio is distorted or delays in the second terminal, an effect of the audio input to the second terminal remains in the audio output from the first terminal. In this case, the user of the first terminal feels a sense of strangeness in conversation.

BRIEF SUMMARY OF THE INVENTION

An object of the present invention is to provide a communication device and a communication method capable of reducing a felling of strangeness that a user feels.

(1) According to an aspect of the present invention, a communication device capable of transmitting and receiving video data and audio data to and from another device and being connected to an audio input device and an audio output device includes a receiver configured to be capable of receiving the audio data transmitted from the another device, an audio data output unit arranged to output the audio data received by the receiver to the audio output device, an audio data input unit to which audio data is input from the audio input device, a transmitter configured to be capable of transmitting the audio data input by the audio data input unit to the another device, a difference detector arranged to detect a difference between the audio data received by the receiver and the audio data input by the audio data input unit when audio is output by the audio output device based on the audio data output from the audio data output unit and audio data based on the output audio is input from the audio input device to the audio data input unit, a determiner arranged to determine whether or not the difference detected by the difference detector exceeds a predetermined acceptable amount, and a presentation signal producer arranged to produce a presentation signal for presenting to a user a request for change of an output condition of the audio based on the audio data output by the audio data output unit when the determiner determines that the difference exceeds the acceptable amount.

In the communication device, the audio data transmitted from the another device is received by the receiver, and the audio data received by the receiver is output to the audio output device by the audio data output unit. The audio data is input from the audio input device to the audio data input unit. The audio data input by the audio data input unit is transmitted to the another device by the transmitter. This allows for conversation between the user of the communication device and a user of the another device.

When the audio output by the audio output device connected to the communication device is input to the audio input device connected to the communication device, the audio input to the another device is output from the another device via the communication device. In this case, in the communication device, components corresponding to the audio data received by the receiver are removed from the audio data input to the audio data input unit, so that the audio input to the another device can be prevented from being output from the another device via the communication device.

When the audio output by the audio output device is distorted or delays, however, an effect of the audio input to the another device remains in the audio output by the another device. This provides a sense of strangeness to the user of the another device.

Therefore, when the audio is output by the audio output device based on the audio data output from the audio data output unit and the audio data based on the output audio is input from the audio input device to the audio data input unit, the geometric or temporal difference between the waveform represented by the audio data received by the receiver and the waveform represented by the audio data input by the audio data input unit is detected by the difference detector.

Here, the geometric difference of the waveform means a degree of discrepancy between one waveform and another waveform. When a waveform obtained by multiplying amplitude of the one waveform by an arbitrary coefficient is equal to the another waveform, the difference between the one waveform and the another waveform attains zero. The temporal difference of the waveform means a degree of shift on the time axis between the one waveform and the another waveform.

The determiner determines whether or not the detected difference exceeds the predetermined acceptable amount. When it is determined that the difference exceeds the acceptable amount, the presentation signal for presenting to the user the request for change of the output condition of the audio based on the audio data output by the audio data output unit is produced by the presentation signal producer.

When the request for change of the output condition of the audio is presented based on the presentation signal, the user of the communication device is encouraged to change the output condition of the audio by the audio output device. As a result, the user of the communication device changes the output condition of the audio by the audio output device, so that a sense of strangeness that the user of the another device feels can be reduced.

(2) The difference detector may include a removal processing unit arranged to perform processing of removing components corresponding to the audio data received by the receiver from the audio data input to the audio data input unit, and a level detector arranged to detect a level of the audio data processed by the removal processing unit as the difference when the audio is output by the audio output device based on the audio data output from the audio data output unit and the audio data based on the output audio is input from the audio input device to the audio data input unit, and the presentation signal may include a request for adjustment of audio volume of the audio output device as the request for change of the output condition of the audio.

In this case, the removal processing unit performs the processing of removing the components corresponding to the audio data received by the receiver from the audio data input to the audio data input unit. When the audio output from the audio output device is distorted, noise to be caused by the distorted audio remains in the audio data processed by the removal processing unit. Therefore, when the audio is output by the audio output device based on the audio data output from the audio data output unit and the audio data based on the output audio is input from the audio input device to the audio data input unit, the level of the audio data processed by the removal processing unit is detected as the difference by the level detector. When the level of the audio data processed by the removal processing unit exceeds the acceptable amount, it can be determined that the audio output by the audio output device is distorted by a certain amount or more. In this case, the presentation signal for presenting to the user the request for adjustment of the audio volume of the audio output device is produced by the presentation signal producer.

The request for adjustment of the audio volume of the audio output device is presented based on the presentation signal, so that the user of the communication device is encouraged to adjust the audio volume of the audio output device. The user of the communication device adjusts the audio volume of the audio output device, so that distortion of the audio output by the audio output device can be reduced. This suppresses noise to be caused by the distorted audio in the audio data processed by the removal processing unit. As a result, a sense of strangeness that the user of the another device feels can be reduced.

(3) The presentation signal may include a request for decrease of the audio volume of the audio output device as the request for change of the output condition of the audio.

In this case, when the level of the audio data processed by the removal processing unit exceeds the acceptable amount, the presentation signal for presenting to the user the request for decrease of the audio volume of the audio output device is produced by the presentation signal producer. The request for decrease of the audio volume of the audio output device is presented based on the presentation signal, so that the user of the communication device is encouraged to decrease the audio volume of the audio output device. The user of the communication device decreases the audio volume of the audio output device, so that distortion of the audio output from the audio output device is reduced. This sufficiently suppresses noise to be caused by the distorted audio in the audio data processed by the removal processing unit. As a result, a sense of strangeness that the user of the another device feels can be reduced.

(4) The communication device, which is capable of being connected to a video output device, may further include a video data output unit arranged to output video data to the video output device, wherein the presentation signal producer may produce video data for displaying the request for adjustment of the audio volume of the audio output device as the presentation signal, and the video data output unit may output the presentation signal produced by the presentation signal producer to the video output device.

In this case, the video data for displaying the request for adjustment of the audio volume of the audio output device is produced by the presentation signal producer. The produced video data is output from the video data output unit to the video output device. This causes the request for adjustment of the audio volume of the audio output device to be displayed by the video output device. As a result, the user of the communication device can be encouraged by the video to adjust the audio volume of the audio output device without being interrupted during conversation.

(5) The audio output device may be configured to be capable of changing an amount of delay of the input audio data, the difference detector may include a delay detector arranged to detect the amount of delay of the audio data input to the audio data input unit with respect to the audio data received by the receiver as the difference when the audio is output by the audio output device based on the audio data output from the audio data output unit and the audio data based on the output audio is input from the audio input device to the audio data input unit, and the presentation signal may include a request for operation involving change of the amount of delay of the audio data in the audio output device as the request for change of the output condition of the audio.

When the audio data significantly delays in the audio output device, a period of time until the audio input to the another device is output from the audio output device connected to the communication device and a period of time until the audio input to the audio input device connected to the communication device is output from the another device are increased. This provides a sense of strangeness to the user of the communication device and the user of the another device during conversation.

Therefore, when the audio is output by the audio output device based on the audio data output from the audio data output unit and the audio data based on the output audio is input from the audio input device to the audio data input unit, the amount of delay of the audio data input to the audio data input unit with respect to the audio data received by the receiver is detected as the difference by the delay detector. When the detected amount of delay exceeds the acceptable amount, the presentation signal for presenting to the user the request for the operation involving change of the amount of delay of the audio data in the audio output device is produced by the presentation signal producer.

The request for the operation involving change of the amount of delay of the audio data is presented based on the presentation signal, so that the user of the communication device is encouraged to perform the operation involving change of the amount of delay of the audio data. The user of the communication device performs the operation involving change of the amount of delay of the audio data, so that the amount of delay of the audio data in the audio output device can be reduced. Thus, the period of time until the audio input to the another device is output from the audio output device connected to the communication device and the period of time until the audio input to the audio input device connected to the communication device is output from the another device can be shortened. As a result, a sense of strangeness that the user of the communication device and the user of the another device feels during conversation can be reduced.

(6) The communication device, which is capable of being connected to a video output device, wherein the video output device may be configured to be capable of displaying video based on video data in a selected display mode of a plurality of display modes and may be set such that an amount of delay of the video data differs according to the plurality of display modes, the audio output device may be configured to adjust the amount of delay of the audio data such that the audio data is synchronized with the video data in the video output device, the receiver may be configured to be capable of receiving the video data transmitted from the another device, the communication device may further include a video data output unit arranged to output the video data received by the receiver, and the presentation signal may include a request for operation of changing the display mode of the video output device as the request for the operation involving change of the amount of delay of the audio data in the audio output device.

In this case, the amount of delay of the video data differs according to the display mode of the video displayed by the video output device. In the audio output device, the amount of delay of the audio data is adjusted such that the audio data is synchronized with the video data in the video output device. Therefore, the amount of delay of the audio data is changed by changing the display mode. When the video output device is set to the display mode with a large amount of delay of the video data, the amount of delay of the audio data is also increased. In the case, the period of time until the audio input to the another device is output from the audio output device connected to the communication device and the period of time until the audio input to the audio input device connected to the communication device is output from the another device is increased. This provides a sense of strangeness to the user of the communication device and the user of the another device during conversation.

Therefore, when the amount of delay detected by the delay detector exceeds the acceptable amount, the presentation signal for presenting to the user the request for the operation of changing the display mode of the video output device is produced by the presentation signal producer. The request for the operation of changing the display mode of the video output device is presented based on the presentation signal, so that the user of the communication device is encouraged to perform the operation of changing the display mode of the video output device. The user of the communication device performs the operation of changing the display mode of the video output device, so that the amount of delay of the video data is changed and the amount of delay of the audio data in the audio output device is changed. Thus, the user of the communication device performs the operation of changing the display mode of the video output device, so that the amount of delay of the audio data can be reduced. As a result, a sense of strangeness that the user of the communication device and the user of the another device feels during conversation can be reduced.

(7) The presentation signal producer may produce video data for displaying the request for the operation of changing the display mode of the video output device as the presentation signal, and the video data output unit may output the presentation signal produced by the presentation signal producer to the video output device.

In this case, the video data for displaying the request for the operation of changing the display mode of the video output device is produced by the presentation signal producer. The produced video data is output to the video output device by the video data output unit. This causes the request for the operation of changing the display mode of the video output device to be displayed by the video display device. As a result, the user of the communication device can be encouraged by the video to perform the operation of changing the display mode of the video output device without being interrupted during conversation.

(8) The communication device may further include a delay unit arranged to delay the audio data received by the receiver, and a removal processing unit arranged to perform processing of removing components corresponding to the audio data delayed by the delay unit from the audio data input to the audio data input unit, wherein the transmitter may be configured to transmit the audio data processed by the removal processing unit to the another device.

In this case, the audio data received by the receiver is delayed by the delay unit. The removal processing unit performs the processing of removing the components corresponding to the audio data delayed by the delay unit from the audio data input to the audio data input unit. The audio data processed by the removal processing unit is transmitted to the another device by the transmitter. Thus, the audio input to the another device can be inhibited from being output from the another device via the communication device even when the audio output from the audio output device connected to the communication device is input to the audio input device connected to the communication device.

The amount of delay of the audio data in the audio output device is decreased based on the presentation signal, thus allowing the components of the audio data received by the receiver to be removed from the audio data input to the audio data input unit without using the delay unit having a large amount of delay. This does not increase cost of the delay unit. As a result, lower cost of the communication device is realized.

(9) According to another aspect of the present invention, a communication device capable of transmitting and receiving video data and audio data to and from another device and being connected to an audio input device and an audio output device includes a receiver configured to be capable of receiving the audio data transmitted from the another device, an audio data output unit arranged to output the audio data received by the receiver to the audio output device, an audio data input unit to which audio data is input from the audio input device, a transmitter configured to be capable of transmitting the audio data input by the audio data input unit to the another device, a difference detector arranged to detect a geometric or temporal difference between a waveform represented by the audio data received by the receiver and a waveform represented by the audio data input by the audio data input unit when audio is output by the audio output device based on the audio data output from the audio data output unit and audio data based on the output audio is input from the audio input device to the audio data input unit, a determiner arranged to determine whether or not the difference detected by the difference detector exceeds a predetermined acceptable amount, and a control signal producer arranged to produce a control signal for changing an output condition of the audio based on the audio data output by the audio data output unit when the determiner determines that the difference exceeds the acceptable amount.

In the communication device, the audio data transmitted from the another device is received by the receiver, and the audio data received by the receiver is output to the audio output device by the audio data output unit. The audio data is input from the audio input device to the audio data input unit. The audio data input by the audio data input unit is transmitted to the another device by the transmitter. This allows for conversation between a user of the communication device and a user of the another device.

When the audio output by the audio output device connected to the communication device is input to the audio input device connected to the communication device, the audio input to the another device is output from the another device via the communication device. In this case, in the communication device, components corresponding to the audio data received by the receiver are removed from the audio data input to the audio data input unit, so that the audio input to the another device can be prevented from being output from the another device via the communication device.

When the audio output by the audio output device is distorted or delays, however, an effect of the audio input to the another device remains in the audio output by the another device. This provides a sense of strangeness to the user of the another device.

Therefore, when the audio is output by the audio output device based on the audio data output from the audio data output unit and the audio data based on the output audio is input from the audio input device to the audio data input unit, the geometric or temporal difference between the waveform represented by the audio data received by the receiver and the waveform represented by the audio data input by the audio data input unit is detected by the difference detector. The determiner determines whether or not the detected difference exceeds the predetermined acceptable amount. When it is determined that the difference exceeds the acceptable amount, the control signal for changing the output condition of the audio based on the audio data output by the audio data output unit is produced by the presentation signal producer.

The output condition of the audio is changed based on the control signal, so that a sense of strangeness that the user of the another device feels can be reduced without interruption during conversation.

(10) According to still another aspect of the present invention, a communication method using a communication device capable of transmitting and receiving video data and audio data to and from another device and being connected to an audio input device and an audio output device includes the steps of receiving the audio data transmitted from the another device using a receiver of the communication device, outputting the received audio data from an audio data output unit of the communication device to the audio output device, inputting audio data from the audio input device to an audio data input unit of the communication device, transmitting the audio data input to the audio data input unit from a transmitter of the communication device to the another device, detecting a geometric or temporal difference between a waveform represented by the audio data received by the receiver and a waveform represented by the audio data input by the audio data input unit when audio is output by the audio output device based on the audio data output from the audio data output unit and audio data based on the output audio is input from the audio input device to the audio data input unit, determining whether or not the detected difference exceeds a predetermined acceptable amount, and outputting a presentation signal for presenting to a user a request for change of an output condition of the audio based on the audio data output by the audio data output unit when it is determined that the difference exceeds the acceptable amount.

In the communication method, the audio data transmitted from the another device is received by the receiver of the communication device, and the audio data received by the receiver is output to the audio output device by the audio data output unit of the communication device. The audio data is input from the audio input device to the audio data input unit. The audio data input by the audio data input unit is transmitted to the another device by the transmitter of the communication device. This allows for conversation between the user of the communication device and a user of the another device.

When the audio output by the audio output device connected to the communication device is input to the audio input device connected to the communication device, the audio input to the another device is output from the another device via the communication device. In this case, in the communication device, components corresponding to the audio data received by the receiver are removed from the audio data input to the audio data input unit, so that the audio input to the another device can be prevented from being output from the another device via the communication device.

When the audio output by the audio output device is distorted or delays, however, an effect of the audio input to the another device remains in the audio output by the another device. This provides a sense of strangeness to the user of the another device.

Therefore, when the audio is output by the audio output device based on the audio data output from the audio data output unit and the audio data based on the output audio is input from the audio input device to the audio data input unit, the geometric or temporal difference between the waveform represented by the audio data received by the receiver and the waveform represented by the audio data input by the audio data input unit is detected. It is determined whether or not the detected difference exceeds the predetermined acceptable amount. When it is determined that the difference exceeds the acceptable amount, the presentation signal for presenting to the user the request for change of the output condition of the audio based on the audio data output by the audio data output unit is produced.

When the request for change of the output condition of the audio is presented based on the presentation signal, the user of the communication device is encouraged to change the output condition of the audio by the audio output device. As a result, the user of the communication device changes the output condition of the audio by the audio output device, so that a sense of strangeness that the user of the another device feels can be reduced.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 is a diagram for schematically explaining a communication system according to a first embodiment;

FIG. 2 is a block diagram showing the configuration of a terminal of FIG. 1;

FIG. 3 is a diagram showing a sign-in screen;

FIG. 4 is a diagram showing a user registration screen;

FIG. 5 is a diagram showing a contact screen;

FIG. 6 is a diagram showing a contact list screen;

FIG. 7 is a diagram showing a conversation screen;

FIG. 8 is a diagram showing an incoming call screen;

FIG. 9 is a diagram showing another example of the incoming call screen;

FIG. 10 is a block diagram showing the detailed configuration of a control LSI of FIG. 2;

FIG. 11 is a diagram for explaining echo cancellation processing by an echo cancellation processing unit;

FIG. 12 is a diagram showing one example of the conversation screen including notification video;

FIG. 13 is a flowchart showing one example of operation of the control LSI based on a conversation program according to the first embodiment;

FIG. 14 is a block diagram showing the configuration of a television according to a second embodiment;

FIG. 15 is a list of respective audio delay amounts corresponding to a plurality of display modes;

FIG. 16 is a block diagram showing the detailed configuration of the control LSI according to the second embodiment;

FIG. 17 is a diagram showing another example of the conversation screen including the notification video;

FIG. 18 is a conceptual diagram of the echo cancellation processing; and

FIG. 19 is a flowchart showing one example of the operation of the control LSI based on a conversation program according to the second embodiment.

DETAILED DESCRIPTION OF THE INVENTION [1] First Embodiment

Description will be made of a communication device and a communication method according to one embodiment of the present invention while referring to the drawings.

(1) Schematic Description of Communication System

Description will be made of a communication system including the communication device according to the first embodiment. FIG. 1 is a diagram for schematically explaining the communication system according to the first embodiment, and FIG. 2 is a block diagram showing the configuration of a terminal 1000 of FIG. 1.

As shown in FIG. 1, the terminal 1000, a base station 800, a personal computer 600, a television receiver (hereinafter abbreviated as a television) 700 and a server for conversation 2000 are connected to a network 500 in the communication system. A mobile telephone 900 is connected to the network 500 via the base station 800. In this manner, the terminal 1000, the personal computer 600, the television 700 and the mobile telephone 900 as the plurality of communication terminals are connected to the network 500. The network 500 is the internet in the present embodiment.

In the example of FIG. 1, a control LSI (Large-Scale Integrated Circuit) 101 (FIG. 2), described below, is incorporated in each of the plurality of communication terminals (the terminal 1000, the personal computer 600, the television 700 and the mobile telephone 900) connected to one another. A conversation program, described below, is stored in an incorporated memory in the control LSI 101 (FIG. 2).

Each communication terminal includes a video input unit, an audio input unit, a video output unit and an audio output unit. The video input unit includes a camera, for example. The audio input unit includes a microphone, for example. The video output unit includes a monitor, for example. The audio output unit includes a speaker, for example.

In the present embodiment, users of the plurality of communication terminals need to register their own unique user information in the server for conversation 2000 in advance. The user information includes a user identifier (hereinafter referred to as a user ID) and a password associated with the user ID. The server for conversation 2000 manages the plurality of users by storing the plurality of pieces of user information of the plurality of users.

New user information is transmitted together with a request for registration of user information from an arbitrary communication terminal to the server for conversation 2000 for registration of the new user information.

The server for conversation 2000 determines whether or not a user ID of the received user information coincides with any of the user IDs of the plurality of pieces of user information that have already been registered while referring to the plurality of pieces of user information that have been stored in advance.

When the user ID of the received user information does not coincide with any of the user IDs of the plurality of pieces of user information that have already been registered, the server for conversation 2000 stores the applied user information. Meanwhile, when the user ID of the received user information coincides with any of the user IDs of the plurality of pieces of user information that have already been registered, the server for conversation 2000 does not store the received user information. This prevents the plurality of pieces of user information including the same user ID from being registered in the server for conversation 2000.

A signing-in request is transmitted together with the user information from one communication terminal, for example, to the server for conversation 2000. In this case, the server for conversation 2000 determines whether or not the received user information coincides with any of the plurality of pieces of user information that have been stored. That is, the server for conversation 2000 determines whether or not the user information received from the one communication terminal has already been registered.

When the received user information is already registered, the server for conversation 2000 determines whether or not the signing-in using the user information that is the same as the received user information is currently performed in order to prevent the plurality of users from signing-in using the same user information.

When the signing-in using the user information that is the same as the received user information is not performed, the server for conversation 2000 permits the user to sign-in. Meanwhile, when the user information received from the one communication terminal is not registered and when the signing-in using the user information that is the same as the received user information is currently performed, the server for conversation 2000 does not permit the user to sign-in.

The user signs-in to the server for conversation 2000, thereby causing an address (an internet protocol address, for example) of the communication terminal to be transmitted from the communication terminal used by the user to the server for conversation 2000. In addition, a request for continuation of the signing-in is transmitted from the communication terminal to the server for conversation 2000 with a given period.

Accordingly, the user that currently signs-in and the address of the communication terminal used by the user that signs-in are managed in the server for conversation 2000.

The user signs-in to the server for conversation 2000 using the one communication terminal, thereby transmitting a request for conversation with another user together with a user ID of the another user to the server for conversation 2000. In this case, the server for conversation 2000 determines whether or not the another user has signed-in based on the received user ID.

When the another user has signed-in, the server for conversation 2000 transmits the address of the communication terminal used by the another user to the one communication terminal. Thus, the one communication terminal accesses the another communication terminal using the received address. This allows for communication of various types of data including video data and audio data between the one communication terminal and the another communication terminal.

Meanwhile, when the another user has not signed-in, the server for conversation 2000 transmits information indicating that the another user has not signed-in to the one communication terminal. In this case, information indicating that conversation with the another user is not possible is presented to the user by the monitor or the speaker in the one communication terminal.

In addition, the user can accept an access from other communication terminals by signing-in to the server for conversation 2000 using the one communication terminal.

Communication of the video data based on video of each user shot by the camera and the audio data based on audio of each user input to the microphone is performed among the plurality of communication terminals. This allows the user of each communication terminal to talk with a user of another communication terminal.

Next, description will be made of the configuration of the terminal 1000. As shown in FIGS. 1 and 2, the terminal 1000 includes the communication device 100, a camera/microphone device 200, a television 300 and two remote controllers 400, 490.

As shown in FIG. 2, the communication device 100 includes the control LSI 101, a network interface 103, a wireless receiver 104, a universal serial bus (hereinafter referred to as a USB) interface 105, a power supplier 106, a high-definition multimedia interface (hereinafter referred to as an HDMI) 107, an optical disk drive 108, a memory slot 109, a fluorescent display tube (hereinafter referred to as an FL display) 191, a light emitting diode unit (hereinafter referred to as an LED unit) 192, a buzzer 193 and a flash memory 112. A memory card 110 is inserted in the memory slot 109. The network interface 103 of the communication device 100 is connected to the network 500 through a network cable (a local area network cable, for example) in the present embodiment.

The control LSI 101 includes a CPU (Central Processing Unit) and a memory, and is realized by an integrated circuit using semiconductors. As described above, the conversation program, described below, is stored together with a system program of the communication device 100 in the memory of the control LSI 101. The CPU executes each program stored in the memory, thereby causing the control LSI 101 to execute various types of processing. The control LSI 101 controls the operation of each structural elements of the communication device 100 and controls communication with the other communication terminals (the personal computer 600, the television 700, the mobile telephone 900, etc. of FIG. 1). The detailed configuration of the control LSI 101 will be described below.

The network interface 103 is connected to the network 500 through the network cable. The network interface 103 causes various types of data including the video data and the audio data to be applied from the network 500 to the control LSI 101 of the communication device 100. Various types of data including the video data and the audio data are also applied from the control LSI 101 of the communication device 100 to the network 500.

The remote controller 400 transmits an operation signal, described below, to the communication device 100 by wireless communication (infrared communication, for example), as described below. The wireless receiver 104 receives the operation signal wirelessly transmitted from the remote controller 400. The operation signal received by the wireless receiver 104 is applied to the control LSI 101.

The USB interface 105 is connected to the camera/microphone device 200 through an USB cable. The power supplier 106 includes an outlet, for example, and is connected to a household power supply. The power supplier 106 supplies power obtained from the household power supply to each structural element of the communication device 100. The HDMI 107 is connected to the television 300 through an HDMI cable. The optical disk drive 108 writes and reads data in and from an optical disk.

The memory slot 109 is configured such that the memory card 110 can be inserted therein and ejected therefrom. With the memory card 110 inserted in the memory slot 109, the control LSI 101 can read data stored in the memory card 110. Moreover, the control LSI 101 can write data in the memory card 110.

The flash memory 112 is connected to the control LSI 101. Another nonvolatile memory may be used instead of the flash memory 112. The flash memory 112 stores the user information of the user using the terminal 1000 and a list of the user IDs of the other users (hereinafter referred to as a contact list), for example. The flash memory 112 stores data (video data and audio data for indicating that the user is absent, for example) that is to be applied to other communication terminals in response to accesses from the other communication terminals when the user cannot accept the accesses from the other communication terminals.

The communication device 100 has a box-shaped casing, for example. The control LSI 101, the network interface 103, the wireless receiver 104, the USB interface 105, the power supplier 106, the HDMI 107, the optical disk drive 108 and the memory slot 109 are built into the casing. The FL display 191, the FL display 191, the LED unit 192 and the buzzer 193 are attached to the casing.

The FL display 191 is composed of a fluorescent display tube of seven segments or a fluorescent display tube of fourteen segments, for example. Information indicating current time, reproduction time of the optical disk and so on is applied from the control LSI 101 to the FL display 191. The FL display 191 displays the applied information.

The LED unit 192 generates monochromatic light. Information that indicates lighting or non-lighting is applied from the control LSI 101 to the LED unit 192. The LED unit 192 lights up, goes out or blinks based on the information applied from the control LSI 101.

Information for instructing the buzzer 193 to generate an alarm is applied from the control LSI 101. In this case, the buzzer 193 generates an alarm sound based on the information applied from the control LSI 101.

The remote controller 400 includes an operator 401, a processing circuit 402 and a wireless transmitter 403. The operator 401 includes a power supply button 411, a conversation start button 412, a conversation response button 413, a cross key 414, a determination button 415 of FIG. 1 and a plurality of number buttons not shown. The cross key 414 includes an upper button, a lower button, a left button and a right button. The user operates any button of the operator 401. An operation signal according to the operated button is produced by the processing circuit 402. The produced operation signal is transmitted from the wireless transmitter 403 to the wireless receiver 104 of the communication device 100. As described above, the wireless communication between the communication device 100 and the remote controller 400 is realized by infrared communication, for example.

The camera/microphone device 200 includes a camera 201, a microphone 202, two analog/digital (hereinafter referred to as A/D) converters 203, 204 and a USB interface 205. The USB interface 205 of the camera/microphone device 200 is connected to the USB interface 105 of the communication device 100 through the USB cable.

The camera 201 includes an imaging element. Video of an object is acquired by the imaging element. A video signal in analog form is produced based on the acquired video in the camera 201. The produced video signal is converted to video data in digital form by the A/D converter 203. The video data in digital form is applied to the control LSI 101 of the communication device 100 through the USB interface 205, the USB cable and the USB interface 105.

Audio (sound wave) is input to the microphone 202 from outside. In the microphone 202, an audio signal in analog form is produced based on the input audio. The produced audio signal is converted to audio data in digital form by the A/D converter 204. The audio data in digital form is applied to the control LSI 101 of the communication device 100 through the USB interface 205, the USB cable and the USB interface 105.

The camera/microphone device 200 is used for acquiring video and audio of the user, for example, in the present embodiment.

As described above, the camera/microphone device 200 includes the camera 201 and the microphone 202. The camera 201 and the microphone 202 may individually be connected to the communication device 100 instead of the camera/microphone device 200 that is connected to the communication device 100.

The camera/microphone device 200 may include an HDMI. In this case, the HDMI of the camera/microphone device 200 is connected to the communication device 100 through an HDMI cable. The camera/microphone device 200 may include a wireless transmitter. In this case, the video data and the audio data are applied from the wireless transmitter of the camera/microphone device 200 to the wireless receiver 104 of the communication device 100.

The television 300 includes a monitor 301, a speaker 302, digital/analog (hereinafter referred to as D/A) converters 303, 304, an HDMI 305, a wireless receiver 306 and a audio volume adjuster 310. The HDMI 305 of the television 300 is connected to the HDMI 107 of the communication device 100 through the HDMI cable in the present embodiment.

The video data and audio data in digital form are applied from the control LSI 101 of the communication device 100 to the television 300 through the HDMI 107, the HDMI cable and the HDMI 305. The video data applied to the television 300 is converted to a video signal in analog form by the D/A converter 303. The video signal in analog form is applied to the monitor 301. Thus, video is displayed on the monitor 301.

Meanwhile, the audio data applied to the television 300 is converted to an audio signal in analog form by the D/A converter 304. An audio signal in analog form is applied to the audio volume adjuster 310.

An operation signal for adjusting the level of the audio signal, for example, is input from the remote controller 490, described below, to the audio volume adjuster 310. Thus, the level of the audio signal is adjusted in the audio volume adjuster 310 based on the operation signal applied from the remote controller 490. The adjusted audio signal is applied to the speaker 302, and audio based on the audio signal is output.

As described above, the communication device 100 and the television 300 are connected to each other through the HDMI 107, the HDMI cable and the HDMI 305. In this case, the communication device 100 can apply a control signal for controlling operation of the television 300 to the television 300.

The monitor 301 and the speaker 302 may individually be connected to the communication device 100 instead of the television 300 that is connected to the communication device 100.

The remote controller 490 includes an operator 491, a processing circuit 492 and a wireless transmitter 493. The operator 491 includes a power supply button 481 and audio volume adjustment buttons 484 a, 484 b of FIG. 1. The user operates any button of the operator 491. An operation signal according to the operated button is produced by the processing circuit 492.

The power supply button 481 is operated, so that an operation signal for turning on a power supply of the television 300 is produced. The audio volume adjustment button 484 a is operated, so that an operation signal for increasing the level of the audio signal is produced. The audio volume adjustment button 484 b is operated, so that an operation signal for decreasing the level of the audio signal is produced.

The produced operation signal is wirelessly transmitted from the wireless transmitter 493 to the wireless receiver 306 of the television 300. Wireless communication between the television 300 and the remote controller 490 is realized by infrared communication, for example.

In the foregoing terminal 1000, the video data and the audio data are applied from the camera/microphone device 200, for example, to the communication device 100. In the communication device 100, the applied video data is encoded and the applied audio data is encoded. During conversation operation of the terminal 1000, described below, the communication device 100 transmits the encoded video data and audio data to the other communication terminals (the personal computer 600, the television 700 and the mobile telephone 900) connected to the network 500.

The user signs-in to the server for conversation 2000 of FIG. 1, so that video data and audio data transmitted from the other communication terminals connected to the network 500 are received by the communication device 100 in the terminal 1000. In the communication device 100, the received video data and audio data are decoded. During the conversation operation of the terminal 1000, described below, the communication device 100 applies the decoded video data and audio data to the television 300. In the television 300, the video data in digital form is converted to the video signal in analog form, and the video based on the converted video signal is displayed on the monitor 301. The audio data in digital form is converted to the audio signal in analog form, and the audio based on the converted audio signal is output from the speaker 302.

This causes communication of the video data and audio data to be performed between the terminal 1000 and the other communication terminals as described above. Similarly to the terminal 1000, each of the plurality of communication terminals connected to the network 500 includes a camera, a microphone, a monitor and a speaker. This allows the user using the terminal 1000 to talk with users using the other communication terminals.

In the communication device 100 of the terminal 1000, video data and audio data read from the optical disk by the optical disk drive 108, for example, are applied to the television 300. In the television 300, the video data in digital form is converted to a video signal in analog form, and video based on the converted video signal is displayed on the monitor 301. Also, the audio data in digital form is converted to an audio signal in analog form, and audio based on the converted audio signal is output from the speaker 302.

Furthermore, video data and audio data received from the network 500, for example, are written in the memory card 110 in the communication device 100 of the terminal 1000.

(2) Outline of Operation of the Terminal 1000 by the User

In the following description, the video signal is applied from the communication device 100 to the monitor 301 of the television 300 through the D/A converter 303. The monitor 301 of the television 300 displays video based on the applied video signal. The audio signal is applied from the communication device 100 to the speaker 302 of the television 300 through the D/A converter 304. The speaker 302 of the television 300 outputs audio based on the applied audio signal.

Description will be made of the outline of the operation of the terminal 1000 by the user together with video to be displayed on the monitor 301 of the television 300.

When one user signs-in to the server for conversation 2000 using the terminal 1000, the one user operates the conversation start button 412 of FIG. 1, for example. This causes an operation signal indicating that the conversation program is to be executed is applied from the remote controller 400 to the communication device 100. The conversation program is executed, thereby causing a sign-in screen to be displayed on the monitor 301 of the television 300.

FIG. 3 is a diagram showing the sign-in screen. As shown in FIG. 3, an input frame f1 for user ID, an input frame f2 for password, a sign-in button b1 and a registration screen button b2 are displayed on the sign-in screen SC1. The user can select the input frames f1, f2, the sign-in button b1 and the registration screen button b2 by operating the cross key 414 of the remote controller 400 of FIG. 1. The user can input text in the input frames f1, f2 using the plurality of number buttons and the like, not shown, of the remote controller 400 of FIG. 1.

After inputting a user ID and a password in the respective input frames f1, f2, the user selects the sign-in button b1 and operates the determination button 415 of the remote controller 400 of FIG. 1. Thus, signing-in to the server for conversation 2000 is executed.

Meanwhile, the user selects the registration screen button b2 and operates the determination button 415 of the remote controller 400 of FIG. 1 for registering user information in the server for conversation 2000. In this case, a user registration screen is displayed on the monitor 301 of the television 300.

FIG. 4 is a diagram showing the user registration screen. As shown in FIG. 4, an input frame f3 for user name, the input frame f1 for user ID, the input frame f2 for password and a registration button b3 are displayed on the user registration screen SC2. The user can select the input frames f1, f2, f3 and the registration button b3 by operating the cross key 414 of the remote controller 400 of FIG. 1. After inputting the user ID, password and name in the respective input frames f1, f2, f3, the user selects the registration button b3 and operates the determination button 415 of the remote controller 400 of FIG. 1. This causes the user information to be registered in the server for conversation 2000.

The signing-in to the server for conversation 2000 is executed, thereby causing a contact screen to be displayed on the monitor 301 of the television 300.

FIG. 5 is a diagram showing the contact screen. As described above, the contact list is stored in the flash memory 112 of the communication device 100 of FIG. 2. As shown in FIG. 5, a contact list button b4 for displaying the contact list stored in the flash memory 112 is displayed on the contact screen SC3. In this state, the user selects the contact list button b4, and operates the determination button 415 of the remote controller 400 of FIG. 1. In this case, a contact list screen is displayed on the monitor 301 of the television 300.

FIG. 6 is a diagram showing the contact list screen. As shown in FIG. 6, a plurality of user ID buttons b5 associated with the plurality of user IDs, respectively, stored in the communication device 100, for example, are displayed on the contact list screen SC4. The user selects any of the plurality of user ID buttons b5 and operates the determination button 415 of the remote controller 400 of FIG. 1.

This causes the selected user ID to be transmitted to the server for conversation 2000 as a conversation request. When the signing-in using the selected user ID is already performed, the communication device 100 of the terminal 1000 acquires the address of the other communication terminal used in the signing-in using the user ID from the server for conversation 2000.

Thereafter, a request signal indicating a request for conversation is transmitted from the communication device 100 of the terminal 1000 to the other communication terminal using the acquired address. The communication device 100 receives a response signal from the other communication terminal, so that a conversation screen is displayed on the monitor 301 of the television 300.

FIG. 7 is a diagram showing the conversation screen. As shown in FIG. 7, a conversation partner display window W1 and a self-display window W2 are displayed on the conversation screen SC5. Video of the other user shot by the camera of the other communication terminal is displayed on the conversation partner display window W1. Video of the user shot by the camera 201 of the camera/microphone device 200 is displayed on the self-display window W2. Audio of the user of the terminal 1000 is input to the microphone 202 of the camera/microphone device 200. Audio of the other user input to the microphone of the other communication terminal is output from the speaker 302 of the television 300.

As described above, when the conversation screen SC5 is displayed on the monitor 301 of the television 300 in the terminal 1000, the conversation screen SC5 is also displayed on the monitor of the communication terminal of the other user. The video of the user of the terminal 1000 and the video of the user of the other communication terminal are displayed on the monitor of the communication terminal of the other user. The audio of the other user is input to the microphone of the other communication terminal. The audio of the user of the terminal 1000 is output from the speaker of the other communication terminal.

A request signal indicating a request for conversation is transmitted from the communication terminal of the other user in some cases with the contact screen SC3 of FIG. 5 displayed on the monitor 301. In this case, an incoming call screen is displayed on the monitor 301 of the television 300.

FIG. 8 is a diagram showing the incoming call screen. In this case, a video response button b6 is displayed together with landscape video SS and a plurality of operation buttons on the incoming call screen SC6 as shown in FIG. 8. FIG. 9 is a diagram showing another example of the incoming call screen SC6. In the example of FIG. 9, landscape video is not displayed on the incoming call screen SC6.

The user selects the video response button b6 of the incoming call screen SC6 of FIG. 8 or 9, and operates the determination button 415 of the remote controller 400 of FIG. 1. Alternatively, the user operates the conversation response button 413 of the remote controller 400 of FIG. 1. In this case, a response signal is transmitted from the communication device 100 of the terminal 1000 to the communication terminal of the other user, and the conversation screen SC5 of FIG. 7 is displayed on the monitor 301 of the television 300. In this state, the user can talk with the other user that has transmitted the request signal.

(3) Conversation Operation

Description will be made of the operation of the terminal 1000 based on the conversation program stored in the memory of the control LSI 101 of FIG. 2 (hereinafter referred to as conversation operation) together with the detailed configuration of the control LSI 101.

FIG. 10 is a block diagram showing the detailed configuration of the control LSI 101 of FIG. 2. Connection relationship among the control LSI 101, the camera/microphone device 200, the television 300 and the network 500 of FIG. 2 is shown in FIG. 10. An interface of each connection portion is not shown.

As shown in FIG. 10, the control LSI 101 is composed of a control block 101A and a communication block 101B. The control block 101A includes buffers 121 a, 121 b, a decoder 122, a synthesizer 123, an encoder 124, an echo cancellation processing unit 125, a difference level detector 126, a notification signal producer 137 and a controller 129. The communication block 101B includes a communication manager 131, a receiver 132, a packetizer 133 and a transmitter 134.

In the communication block 101B of the control LSI 101, when communication is performed between the terminal 1000 and the other communication terminal, the communication manager 131 detects an encoding method of data that can be decoded in the other communication terminal, and applies an instruction signal instructing to encode data for transmission in the detected encoding method to the controller 129 of the control block 101A. When video data encoded in the H.264 format can be decoded in the other communication terminal, for example, the communication manager 131 applies an instruction signal instructing to encode the video data in the H.264 format to the controller 129. When the audio data encoded in the SILK form can be decoded in the other communication terminal, the communication manager 131 applies an instruction signal instructing to encode the audio data in the SILK format to the controller 129.

The other communication terminal transmits the data (video data and audio data) to the terminal 1000 through the network 500. The receiver 132 of the terminal 1000 receives the data (video data and audio data) transmitted from the other communication terminal. The received data is packetized.

The receiver 132 applies the received data to the buffer 121 a of the control block 101A. The data is temporarily stored in the buffer 121 a. The receiver 132 applies a reception signal indicating that the data (video data and audio data) is received to the controller 129.

In this case, the controller 129 applies an instruction signal for instructing to decode the data (video data and audio data) stored in the buffer 121 a to the decoder 122. This causes the data (video data and audio data) stored in the buffer 121 a to be decoded by the decoder 122.

In the following description, the video data decoded by the decoder 122 is referred to as received video data Da, and the audio data decoded by the decoder 122 is referred to as received audio data Db.

The received video data Da is applied to the synthesizer 123. Transmitted video data Dc, described below, is applied from the camera/microphone device 200 to the synthesizer 123. The synthesizer 123 synthesizes the received video data Da and the transmitted video data Dc into synthesized video data E.

The synthesizer 123 applies the produced synthesized video data E to the D/A converter 303 of the television 300. The synthesized video data E in digital form is converted to a synthesized video signal in analog form in the D/A converter 303. This causes video based on the synthesized video signal (the conversation screen SC5 of FIG. 7 or a conversation screen SC5 of FIG. 12, described below, for example) to be displayed on the monitor 301 of the television 300.

The received audio data Db is applied to the D/A converter 304 of the television 300 while being applied to the buffer 121 b in the control block 101A. The received audio data Db in digital form is converted to an audio signal in analog form in the D/A converter 304. The audio signal in analog form is input to the speaker 302 through the audio volume adjuster 310. Audio based on the audio signal is output from the speaker 302.

Video is shot by the camera 201 of the camera/microphone device 200. A video signal based on the shot video is applied to the A/D converter 203. In the ND converter 203, a video signal in analog form is converted to video data in digital form.

Audio is input to the microphone 202 of the camera/microphone device 200. An audio signal based on the input audio is applied to the A/D converter 204. The audio signal in analog form is converted to audio data in digital form in the A/D converter 204.

In the following description, the video data converted by the A/D converter 203 is referred to as transmitted video data Dc, and the audio data converted by the A/D converter 204 is referred to as transmitted audio data Dd.

Part of the audio output from the speaker 302 of the television 300 is input to the microphone 202 of the camera/microphone device 200 in some cases. Therefore, the echo cancellation processing unit 125 is used in the present embodiment.

When the received audio data Db is applied from the communication device 100 to the television 300, the audio based on the received audio data Db is output from the speaker 302 to be input to the microphone 202. In the camera/microphone device 200, the transmitted audio data Dd based on the input audio is produced to be applied to the communication device 100. In this case, delay (hereinafter an amount of the delay is referred to as an input/output delay amount) occurs between a timing at which the received audio data Db based on the common audio is applied to the television 300 and a timing at which the transmitted audio data Dd based on the common audio is applied to the communication device 100. The buffer 121 b delays the received audio data Db applied from the decoder 122 by a period of time that is equivalent to the input/output delay amount and outputs the received audio data Db to the echo cancellation processing unit 125.

The echo cancellation processing unit 125 is provided with the transmitted audio data Dd from the camera/microphone device 200 and the received audio data Db from the buffer 121 b.

The echo cancellation processing unit 125 is controlled by the controller 129. The echo cancellation processing unit 125 receives a start signal from the controller 129, thereby starting echo cancellation processing described below.

Here, the start signal is applied from the controller 129 to the echo cancellation processing unit 125 at a timing at which it is assumed that the user of the terminal 1000 does not generate audio. For example, it is unlikely that the user of the terminal 1000 generates audio immediately after response to the request signal from the other user. Therefore, when the conversation response button 413 is operated by the user or when the video response button b6 of FIG. 8 is selected, the start signal is applied from the controller 129 to the echo cancellation processing unit 125.

It is unlikely that the user of the terminal 1000 generates audio during output of the audio of the other user from the television 300. Therefore, when the level of the received audio data Db exceeds a given threshold value (hereinafter referred to as an audio threshold value), the start signal is applied from the controller 129 to the echo cancellation processing unit 125.

Upon reception of the start signal from the controller 129, the echo cancellation processing unit 125 detects the level of the transmitted audio data Dd applied from the camera/microphone device 200 and detects the level of the received audio data Db applied from the buffer 121 b.

The echo cancellation processing unit 125 calculates an average value (or a maximum value) of the level of the transmitted audio data Dd detected in a given period, and calculates an average value (or a maximum value) of the level of the received audio data Db detected in the given period, for example. Thereafter, the echo cancellation processing unit 125 amplifies the level of the received audio data Db such that the average value (or the maximum value) of the level of the received audio data Db is the same as the average value (or the maximum value) of the level of the transmitted audio data Dd.

Then, the echo cancellation processing unit 125 performs the echo cancellation processing in which the amplified received audio data Db is subtracted from the applied transmitted audio data Dd. The transmitted audio data Dd after the echo cancellation processing is applied to the encoder 124 while being applied to the difference level detector 126.

FIG. 11 is a diagram for explaining the echo cancellation processing by the echo cancellation processing unit 125. In (a) to (d) of FIG. 11, the ordinate indicates the level of the audio data, and the abscissa indicates time. FIG. 11 (a) shows change of the level of the transmitted audio data Dd over time. FIG. 11 (b) shows change of the level of the received audio data Db over time. FIG. 11 (c) shows change of the level of the received audio data Db amplified by the echo cancellation processing unit 125 over time. FIG. 11 (d) shows change of the level of the transmitted audio data Dd after the echo cancellation processing over time.

The transmitted audio data Dd and the received audio data Db are produced based on the common audio. Therefore, the change of the level of the transmitted audio data Dd over time and the change of the level of the amplified received audio data Db over time are close to each other as shown in (a) and (c) of FIG. 11. Thus, when the user of the terminal 1000 does not generate audio, the level of the transmitted audio data Dd after the echo cancellation processing inherently attains substantially zero as indicated by the solid line in FIG. 11 (d).

Televisions 300 to be connected to the communication device 100 do not always have the same specifications. Therefore, when the audio volume of audio output from the speaker 302 is high, the audio output from the speaker 302 may be distorted because of characteristics of audio output of the speaker 302. In this case, a significant gap occurs between the change of the level of the transmitted audio data Dd over time and the change of the level of the amplified received audio data Db over time. Thus, components of the received audio data Db cannot sufficiently be removed from the transmitted audio data Dd. As a result, the level of the transmitted audio data Dd after the echo cancellation processing is increased as indicated by the dotted line in FIG. 11 (d).

The level of the transmitted audio data Dd after the echo cancellation processing is equivalent to a geometric difference between a waveform represented by the transmitted audio data Dd before the echo cancellation processing and a waveform represented by the received audio data Db.

Therefore, the difference level detector 126 determines whether or not the absolute value of the level of the transmitted audio data Dd after the echo cancellation processing exceeds a given threshold value (hereinafter referred to as a level difference threshold value) TH1 in the present embodiment. The level difference threshold value TH1 is set to an upper limit (an acceptable value) of an acceptable range of the level of the transmitted audio data Dd after the echo cancellation processing. When the absolute value of the level of the transmitted audio data Dd after the echo cancellation processing exceeds the level difference threshold value TH1, the difference level detector 126 applies a detection signal indicating detection of distortion of audio to the controller 129.

In this case, the controller 129 applies an instruction signal instructing to produce notification data M to a notification signal producer 127 in response to the detection signal. The notification signal producer 127 produces the notification data M in response to the instruction signal from the controller 129, and applies the produced notification data M to the synthesizer 123. The notification data M includes data for presenting a request for change of an output condition of audio output from the speaker 302 to the user. In the present embodiment, the notification data M includes video data for displaying notification video that requests operation of decreasing the audio volume of the audio output from the speaker 302.

Accordingly, the notification data M is applied from the notification signal producer 127 to the synthesizer 123 together with the received video data Da and the transmitted video data Dc. In this case, the synthesizer 123 synthesizes the received video data Da, the transmitted video data Dc and the notification data M into the synthesized video data E.

The produced synthesized video data E is converted to the synthesized video signal in analog form by the D/A converter 303 of the television 300, so that the conversation screen SC5 including the notification video is displayed on the monitor 301 of the television 300.

FIG. 12 is a diagram showing one example of the conversation screen SC5 including the notification video. As shown in FIG. 12, a notification window W3 in addition to the conversation partner display window W1 and the self-display window W2 is displayed in the conversation screen SC5. The notification video is displayed on the notification window W3.

In the example of FIG. 12, the notification video including a message saying “voice might be hard to be heard by conversation partner. Turn down audio volume of TV a little” is displayed on the notification window W3. When the audio volume of the audio output from the speaker 302 is decreased by operation of the audio volume adjustment button 484 b of the remote controller 490 by the user, distortion of the audio is reduced. As a result, the level of the transmitted audio data Dd after the echo cancellation processing can be brought close to zero.

When the absolute value of the level of the transmitted audio data Dd after the echo cancellation processing is not higher than the level difference threshold value TH1 because of the user operation of the remote controller 490 according to the message of the notification video, the difference level detector 126 does not apply the detection signal to the controller 129. In this case, the notification data M is not produced because the instruction signal is not applied from the controller 129 to the notification signal producer 127. Therefore, the notification window W3 of FIG. 12 including the notification video is not displayed on the conversation screen SC5.

As a result, the user can easily recognize that the audio volume of the audio output from the speaker 302 need not be adjusted by visually recognizing disappearance of the notification window W3.

When not accepting the start signal from the controller 129, the echo cancellation processing unit 125 of FIG. 10 does not perform the echo cancellation processing. In this case, the echo cancellation processing unit 125 applies the transmitted audio data Dd applied from the camera/microphone device 200 to the encoder 124.

During the conversation operation, when the instruction signal regarding encoding is applied from the communication manager 131 to the controller 129, the controller 129 applies a designation signal designating the encoding method according to the applied instruction signal to the encoder 124. Thus, the encoder 124 encodes the transmitted video data Dc and the transmitted audio data Dd in the encoding method designated by the designation signal. The encoded transmitted video data Dc and transmitted audio data Dd are applied to the packetizer 133. The packetizer 133 packetizes the transmitted video data Dc and the transmitted audio data Dd. The packetized transmitted video data Dc and transmitted audio data Dd are transmitted from the transmitter 134 to the communication terminal of the other user through the network 500.

The foregoing functions of the controller 129 are realized by hardware such as a CPU (Central Processing Unit) and a memory and software such as computer programs.

The buffers 121 a, 121 b, the decoder 122, the synthesizer 123, the encoder 124, the echo cancellation processing unit 125, the difference level detector 126, the notification signal producer 137, the communication manager 131, the receiver 132, the packetizer 133 and the transmitter 134 may be realized by hardware such as electronic circuits, and part of these structural elements may be realized by hardware such as a CPU and a memory and software such as computer programs.

(4) The Conversation Program

Description will be made of one example of the processing based on the conversation program according to the first embodiment. FIG. 13 is a flowchart showing one example of the operation of the control LSI 101 based on the conversation program according to the first embodiment. The operation described below is executed with a given period by the user signing-in to the server for conversation 2000 using the terminal 1000, for example.

First, the controller 129 of the control LSI 101 of FIG. 10 determines whether or not the user has responded to the request signal from the other communication terminal by operating the remote controller 400 of FIG. 1 (Step S11). More specifically, when the conversation response button 413 of the remote controller 400 of FIG. 1 is operated, the controller 129 determines that the user has responded to the request signal from the other communication terminal based on the operation signal applied from the remote controller 400. Alternatively, when the video response button b6 of FIG. 8 is selected by operation of the cross key 414 of the remote controller 400 of FIG. 1 and the determination button 415 of the remote controller 400 of FIG. 1 is then operated, the controller 129 determines that the user has responded to the request signal from the other communication terminal based on the operation signal applied from the remote controller 400.

When the user has responded to the request signal from the other communication terminal, the echo cancellation processing unit 125 performs the foregoing echo cancellation processing (Step S12).

Next, the difference level detector 126 determines whether or not the level of the transmitted audio data Dd after the echo cancellation processing exceeds the level difference threshold value (Step S13).

When the level of the transmitted audio data Dd after the echo cancellation processing exceeds the level difference threshold value, the notification signal producer 127 of FIG. 10 produces the notification data M (Step S14).

The synthesizer 123 then combines the produced notification data M and the transmitted audio data Dd applied from the camera/microphone device 200 with the received video data Da (Step S15). After that, the synthesizer 123 outputs the synthesized video data E obtained from the synthesis to the television 300 (Step S16). In this manner, the conversation program is finished.

When the user has not responded to the request signal from the other communication terminal in the foregoing Step S11, the controller 129 determines whether or not the level of the received audio data Db exceeds the audio threshold value (Step S21). That is, the controller 129 determines whether or not the other user has generated audio. When the level of the received audio data Db exceeds the audio threshold value, the echo cancellation processing unit 125 executes the echo cancellation processing of the foregoing Step S12. Meanwhile, when the level of the received audio data Db does not exceed the audio threshold value, the controller 129 executes the processing of the foregoing Step S11.

When the level of the transmitted audio data Dd after the echo cancellation processing does not exceed the level difference threshold value in the foregoing Step S13, the synthesizer 123 combines the transmitted audio data Dd applied from the camera/microphone device 200 with the received video data Da (Step S31). After that, the synthesizer 123 executes the processing of the foregoing Step S16, and outputs the synthesized video data E obtained from the synthesis to the television 300.

The series of processing according to the foregoing conversation program is repeatedly executed with a given period, so that the notification video is displayed on the monitor 301 by the processing of the foregoing Steps S13 to S15 when the level of the transmitted audio data Dd after the echo cancellation processing exceeds the level difference threshold value.

Meanwhile, when the level of the transmitted audio data Dd after the echo cancellation processing is not higher than the level difference threshold value with the notification video displayed on the monitor 301, the notification data M is not produced because of the processing of the foregoing Steps S13, S31. This causes the notification image displayed on the monitor 301 to disappear.

Accordingly, the user can easily recognize whether or not the audio volume of the audio output from the speaker 302 needs adjustment by confirming the display state of the notification image.

(5) Effects

The difference level detector 126 determines whether or not the absolute value of the level of the transmitted audio data Dd after the echo cancellation processing exceeds the level difference threshold value TH1 in the present embodiment. When the absolute value of the level of the transmitted audio data Dd after the echo cancellation processing exceeds the level difference threshold value TH1, the notification data M for displaying the notification video on the monitor 301 is produced. The produced notification data M is combined with the received video data Da and the transmitted video data Dc by the synthesizer 123 to be applied to the monitor 301.

Thus, the notification video is displayed on the monitor 301 of the television 300. In this case, the user can decrease the audio volume of the audio output from the speaker 302 according to the message included in the notification video. This reduces distortion in the audio output from the speaker 302. Therefore, the absolute value of the level of the transmitted audio data Dd after the echo cancellation processing can be not higher than the level difference threshold value TH1.

Accordingly, the effects of the audio input in the other communication terminal can sufficiently be removed from the transmitted audio data Dd to be transmitted to the other communication terminal. As a result, a sense of strangeness that the user of the other communication terminal feels can sufficiently be suppressed.

[2] Second Embodiment

Description will be made of a communication system according to a second embodiment while referring to differences from the communication system according to the first embodiment. An antenna is connected to the television 300 of the terminal 1000 of FIG. 1 in the communication system according to the second embodiment. The television 300 is configured to be capable of receiving a broadcast signal transmitted from a broadcast station device using the antenna.

(1) Configuration of the Television

FIG. 14 is a block diagram showing the configuration of the television 300 according to the second embodiment. As shown in FIG. 14, the television 300 according to the present embodiment includes the monitor 301, the speaker 302, the D/A converters 303, 304, the HDMI 305, the wireless receiver 306, a video/audio processing circuit 307, a tuner 308 and the audio volume adjuster 310. The antenna 309 is connected to the tuner 308. The HDMI 305 of the television 300 is connected to the HDMI 107 of the communication device 100 through the HDMI cable. The detailed configuration of the communication device 100 will be described below.

The remote controller 490 used in the television 300 of FIG. 14 has the same configuration as the remote controller 490 of FIGS. 1 and 2 except that the operator 491 includes a display mode button 485. The display mode button 485 is operated, so that an operation signal, described below, for changing the display mode of the television 300 is produced.

In the television 300 of FIG. 14, a broadcast signal transmitted from a broadcast station device is received by the antenna 309. The tuner 308 selects a channel, and demodulates the broadcast signal of the selected channel into video data and audio data. The demodulated video data and audio data are applied to the video/audio processing circuit 307.

The video/audio processing circuit 307 is realized by an LSI, for example. The video/audio processing circuit 307 includes a decoder. The decoder of the video/audio processing circuit 307 decodes the video data and the audio data applied from the tuner 308.

A plurality of display modes that can be displayed on the monitor 301 are set in the television 300. The plurality of display modes include a standard mode, a cinema mode, a dynamic mode and a conversation mode, for example.

The user can select any of the plurality of display modes by operating the display mode button 485 of the remote controller 490. The video/audio processing circuit 307 performs video adjustment processing for adjusting image quality, brightness, etc. of the video on the video data based on the display mode selected by the user.

Respectively different types of video adjustment processing are performed in the plurality of display modes. This causes required time to differ in the plurality of types of video adjustment processing. Therefore, the video/audio processing circuit 307 synchronizes a timing at which the video data after the video adjustment processing is output to the D/A converter 303 and a timing at which the audio data is output to the D/A converter 304. In this case, the video/audio processing circuit 307 delays the audio data in order to synchronize the audio data and the video data. The D/A converter 303 converts the video data to a video signal in analog form, and video is displayed on the monitor 301. The D/A converter 304 converts the audio data to an audio signal in analog form, and audio is output from the speaker 302. In this manner, the timing of the video displayed on the monitor 301 coincides with the timing of the audio output from the speaker 302 since the output timing of the video data coincides with the output timing of the audio data.

(2) Delay of the Audio Output from the Speaker

During the foregoing conversation operation, the television 300 also performs the video adjustment processing based on the display mode selected by the user. Therefore, when a period of time for the video adjustment processing corresponding to the selected display mode is increased, a significant difference occurs between the timing at which the received audio data Db is applied from the communication device 100 to the television 300 and the timing at which the audio is output from the speaker 302 of the television 300 based on the received audio data Db. In this case, a period of time from audio generation by the user of the other communication terminal to the output of the audio of the user of the other communication terminal from the speaker 302 is increased. Therefore, although the user can visually recognize the video displayed on the monitor 301 in a desired display mode, he or she feels a sense of strangeness in conversation.

The difference between the timing at which the received audio data Db is applied from the communication device 100 to the television 300 and the timing at which the audio is output from the speaker 302 of the television 300 based on the received audio data Db is referred to as an audio delay amount in the following description.

FIG. 15 is a list of audio delay amounts corresponding to the plurality of display modes, respectively. In the example of FIG. 15, the audio delay amounts corresponding to the standard mode, the cinema mode, the dynamic mode and the conversation mode are 200 msec, 400 msec, 300 msec and 100 msec, respectively.

As described above, the audio delay amount corresponding to the standard mode is twice as large as the audio delay amount corresponding to the conversation mode in the example of FIG. 15. The audio delay amount corresponding to the dynamic mode is three times as large as the audio delay amount corresponding to the conversation mode. The audio delay amount corresponding to the cinema mode is four times as large as the audio delay amount corresponding to the conversation mode.

Therefore, when the cinema mode is selected in the television 300 and the input/output delay amount, described with reference to FIG. 10, is large, for example, the display mode of the television 300 is preferably changed to a display mode with a small audio delay amount (the conversation mode, for example). The display mode of the television 300 is changed to the conversation mode, so that the audio delay amount in the television 300 is reduced. This suppresses a sense of strangeness that the user feels during conversation. The control LSI 101 of the communication device 100 has a configuration described below in the present embodiment.

(3) The Detailed Configuration of the Control LSI

FIG. 16 is a block diagram showing the detailed configuration of the control LSI 101 according to the second embodiment. As shown in FIG. 16, the control LSI 101 according to the present embodiment has the same configuration as the control LSI 101 of FIG. 10 except that it further includes a delay amount detector 121 c.

In the control LSI 101, the received audio data Db decoded by the decoder 122 is applied to the delay amount detector 121 c and the buffer 121 b while being applied to the video/audio processing circuit 307 of the television 300. The transmitted audio data Dd is applied from the ND converter 204 of the camera/microphone device 200 to the delay amount detector 121 c and the echo cancellation processing unit 125.

The delay amount detector 121 c is controlled by the controller 129. The delay amount detector 121 c receives the start signal from the controller 129 to perform the following delay amount detection processing. The start signal is applied from the controller 129 to the delay amount detector 121 c at the same timing as application of the start signal to the foregoing echo cancellation processing unit 125.

Upon reception of the start signal from the controller 129, the delay amount detector 121 c detects the change of the level of the transmitted audio data Dd over time and detects the change of the level of the received audio data Db over time.

Furthermore, the delay amount detector 121 c performs pattern matching of a waveform of the detected received audio data Db and a waveform of the detected transmitted audio data Dd. Thus, the delay amount detector 121 c performs the delay amount detection processing for detecting the foregoing input/output delay amount. The input/output delay amount corresponds to a temporal difference between the waveform of the transmitted audio data Dd and the waveform of the received audio data Db.

Then, the delay amount detector 121 c determines whether or not the detected input/output delay amount exceeds a given threshold value (hereinafter referred to as a delay amount threshold value). The delay amount threshold value is set to an upper limit (an acceptable value) of an acceptable range of the input/output delay amount.

The delay amount detector 121 c applies a detection signal indicating that the input/output delay amount is large to the controller 129 when the input/output delay amount exceeds the delay amount threshold value.

In this case, the controller 129 applies an instruction signal instructing to produce the notification data M to the notification signal producer 127 in response to the detection signal. The notification signal producer 127 produces the notification data M in response to the instruction signal from the controller 129, and applies the produced notification data M to the synthesizer 123. The notification data M includes data for presenting to the user the request for change of the output condition of the audio output from the speaker 302. The notification data M includes video data in the present embodiment. The video data is data for displaying the notification video requesting change of the display mode of the television 300 to the monitor 301.

Thus, the notification data M is applied together with the received video data Da and the transmitted video data Dc from the notification signal producer 127 to the synthesizer 123. In this case, the synthesizer 123 synthesizes the received video data Da, the transmitted video data Dc and the notification data M into the synthesized video data E.

The produced synthesized video data E is applied to the D/A converter 303 through the video/audio processing circuit 307 of the television 300. The synthesized video data E is converted to a synthesized video signal in analog form by the D/A converter 303, so that the conversation screen SC5 including the notification video is displayed on the monitor 301 of the television 300.

FIG. 17 is a diagram showing another example of the conversation screen SC5 including the notification video. In this example, it is assumed that the cinema mode is selected as the display mode of the television 300. As shown in FIG. 17, the notification window W3 in addition to the conversation partner display window W1 and the self-display window W2 is displayed on the conversation screen SC5. The notification video is displayed on the notification window W3.

In the example of FIG. 17, the notification video including a message saying “change display mode to conversation mode” is displayed on the notification window W3. The display mode of the television 300 is changed from the cinema mode to the conversation mode by the user operating the display mode button 485 of the remote controller 490. Accordingly, the audio delay amount in the television 300 is decreased, and the input/output delay amount is decreased.

(4) Capacity of the Buffer Required for Dealing with Delay of the Audio

FIG. 18 is a conceptual diagram of the echo cancellation processing. As shown in FIG. 18, when the audio delay amount in the television 300 is increased, a period of time of the received audio data Db delayed by the buffer 121 b used for the echo cancellation processing needs to be lengthened. Therefore, the storage capacity of the buffer 121 b has to be increased.

The storage capacity of the buffer 121 b is preferably determined based on respective delay periods of time corresponding to the plurality of display modes in the television 300. However, televisions 300 to be connected to the communication device 100 do not always have the same specifications. Therefore, the buffer 121 b needs to have a large storage capacity in order to allow televisions 300 having various specifications to be connected to the communication device 100.

As described above, the notification video for decreasing the input/output delay amount is presented to the user by the delay amount detector 121 c according to the control LSI 101 of FIG. 16. This causes the user to change the display mode to the conversation mode such that the input/output delay amount is decreased, thus eliminating the need to increase the storage capacity of the buffer 121 b. As a result, an increase in cost of the control LSI 101 is suppressed.

(5) The Conversation Program

Description will be made of one example of processing based on the conversation program according to the second embodiment. FIG. 19 is a flowchart showing one example of the operation of the control LSI 101 based on the conversation program according to the second embodiment. The operation described below is repeatedly executed with a given period by the user signing-in to the server for conversation 2000 using the terminal 1000, for example.

First, the controller 129 of the control LSI 101 of FIG. 16 determines whether or not the user has responded to the request signal from the other communication terminal by operating the remote controller 400 of FIG. 1 (Step S41).

When the user has responded to the request signal from the other communication terminal, the delay amount detector 121 c performs the foregoing delay amount detection processing (Step S42).

Next, the delay amount detector 121 c determines whether or not the detected input/output delay amount exceeds the delay amount threshold value (Step S43).

When the input/output delay amount exceeds the delay amount threshold value, the notification signal producer 127 of FIG. 16 produces the notification data M (Step S44).

The synthesizer 123 then combines the produced notification data M and the transmitted audio data Dd applied from the camera/microphone device 200 of FIG. 16 with the received video data Da (Step S45). After that, the synthesizer 123 outputs the synthesized video data E obtained from the synthesis to the television 300 (Step S46). Accordingly, the conversation program is finished.

When the user has not responded to the request signal from the other communication terminal in the foregoing Step S41, the controller 129 determines whether or not the level of the received audio data Db exceeds the audio threshold value (Step S51). When the level of the received audio data Db exceeds the audio threshold value, the delay amount detector 121 c executes the delay amount detection processing of the foregoing Step S42. Meanwhile, when the level of the received audio data Db does not exceed the audio threshold value, the controller 129 executes the processing of the foregoing Step S41.

When the detected input/output delay amount does not exceed the delay amount threshold value in the foregoing Step S43, the synthesizer 123 combines the transmitted audio data Dd applied from the camera/microphone device 200 with the received video data Da (Step S61). After that, the synthesizer 123 executes the processing of the foregoing Step S46, and outputs the synthesized video data E obtained from the synthesis to the television 300.

The series of processing according to the foregoing conversation program is repeatedly executed with a given period, so that the notification video is displayed on the monitor 301 because of the processing of the foregoing Steps S43 to S45 when the input/output delay amount exceeds the delay amount threshold value.

Meanwhile, when the input/output delay amount is not higher than the delay amount threshold value with the notification video displayed on the monitor 301, the notification data M is not produced because of the processing of the foregoing Steps S43, S61. This causes the notification image displayed on the monitor 301 to disappear.

Accordingly, the user can easily recognize whether or not to change the display mode of the television 300 to the conversation mode by confirming the display state of the notification image.

(6) Effects

The delay amount detector 121 c detects the input/output delay amount, and determines whether or not the detected input/output delay amount exceeds the delay amount threshold value in the present embodiment. When the input/output delay amount exceeds the delay amount threshold value, the notification data M for displaying the notification video on the monitor 301 is produced. The produced notification data M is combined with the received video data Da and the transmitted video data Dc by the synthesizer 123 to be applied to the monitor 301.

Thus, the notification video is displayed on the monitor 301 of the television 300. In this case, the user can change the display mode of the television 300 to the conversation mode according to the message included in the notification video. The display mode of the television 300 is changed to the conversation mode, so that the audio delay amount in the television 300 can be minimized. This allows the input/output delay amount to be sufficiently decreased.

As a result, a sense of strangeness that the user of each communication terminal feels can sufficiently be suppressed during conversation using the plurality of communication terminals.

[3] Modifications

(1) When the detection signal is applied from the difference level detector 126, the controller 129 may apply a control signal for decreasing the level of the audio signal to the audio volume adjuster 310 of the television 300 as indicated by the one-dot and dash line in FIGS. 10 and 16 in the first and second embodiments. In this case, the audio volume adjuster 310 decreases the level of the audio signal based on the control signal applied from the controller 129. This causes the audio volume of the audio output from the speaker 302 to be decreased, reducing distortion of the audio. As a result, a sense of strangeness that the user feels during conversation is sufficiently suppressed without the need to operate the remote controller 490 by the user.

When the control signal is applied from the controller 129 to the audio volume adjuster 310 in this manner, the notification signal producer 127 of FIG. 10 may not be provided. Alternatively, the notification signal producer 127 of FIG. 10 may produce the following notification data M.

In this example, the notification data M may include video data of notification video indicating that the audio volume of the audio output from the speaker 302 has been decreased in order to reduce distortion of the audio.

In this case, when the audio volume of the audio output from the speaker 302 is decreased because of application of the detection signal from the difference level detector 126 to the controller 129, the notification video indicating that the audio volume has been decreased in order to reduce distortion of the audio is displayed on the monitor 301. The user visually recognizes the foregoing notification video, thereby easily recognizing that the audio volume has been decreased in order to reduce distortion of the audio.

The notification data M may include audio data of an audio guide notifying that the audio volume has been decreased in order to reduce distortion of the audio.

In this case, when the audio volume of the audio output from the speaker 302 is decreased by application of the detection signal from the difference level detector 126 to the controller 129, the audio guide notifying that the audio volume has been decreased in order to reduce distortion of the audio is output from the speaker 302. Accordingly, the user can easily recognize that the audio volume has been decreased in order to reduce distortion of the audio by listening to the foregoing audio guide.

(2) When the detection signal is applied from the delay amount detector 121 c, the controller 129 may apply a control signal for changing the display mode of the television 300 to the conversation mode to the video/audio processing circuit 307 of the television 300 as indicated by the one-dot and dash line in FIG. 16 in the second embodiment. In this case, the video/audio processing circuit 307 performs the video adjustment processing corresponding to the conversation mode based on the control signal applied from the controller 129. Thus, the audio delay amount in the television 300 can be minimized. This sufficiently decreases the input/output delay amount. As a result, a sense of strangeness that the user feels during conversation can sufficiently be suppressed without the need to operate the remote controller 490 by the user.

When the control signal is applied from the controller 129 to the audio volume adjuster 310 in this manner, the notification signal producer 127 of FIG. 16 may not be provided. Alternatively, the notification signal producer 127 of FIG. 16 may produce the following notification data M.

In this example, the notification data M may include video data of notification video indicating that the display mode of the television 300 has been changed to the conversation mode in order to decrease the audio delay amount.

In this case, when the display mode of the television 300 is changed to the conversation mode by application of the detection signal from the difference level detector 126 to the controller 129, the notification video indicating that the display mode of the television 300 has been changed to the conversation mode in order to decrease the audio delay amount is displayed on the monitor 301. Thus, the user can easily recognize that the display mode of the television 300 has been changed to the conversation mode in order to decrease the audio delay amount by visually recognizing the foregoing notification video.

The notification data M may include audio data of an audio guide notifying that the display mode of the television 300 has been changed to the conversation mode in order to decrease the audio delay amount.

In this case, when the display mode of the television 300 is changed to the conversation mode by application of the detection signal from the difference level detector 126 to the controller 129, the audio guide notifying that the display mode of the television 300 has been changed to the conversation mode in order to decrease the audio delay amount is output from the speaker 302. Thus, the user can easily recognize that the display mode of the television 300 has been changed to the conversation mode in order to decrease the audio delay amount by listening to the foregoing audio guide.

(3) When the detection signal is applied from the delay amount detector 121 c, the controller 129 may apply a control signal for minimizing or eliminating delay of the audio data to the video/audio processing circuit 307 as indicated by the one-dot and dash line in FIG. 16 in the second embodiment.

In this case, the video/audio processing circuit 307 minimizes or eliminates delay of the audio data based on the control signal applied from the controller 129. This sufficiently decreases the input/output delay amount. As a result, a sense of strangeness that the user feels during conversation is sufficiently suppressed without the need to operate the remote controller 490 by the user.

When the instruction signal is applied from the controller 129 to the audio volume adjuster 310 in this manner, the notification signal producer 127 of FIG. 16 may not be provided. Alternatively, the notification signal producer 127 of FIG. 16 may produce the following notification data M.

In this example, the notification data M may include video data of notification video indicating that the delay amount of the audio data in the television 300 has been decreased in order to decrease the audio delay amount. The notification data M may include audio data of an audio guide notifying that the delay amount of the audio data in the television 300 has been decreased in order to decrease the audio delay amount.

Thus, the user can easily recognize that the delay amount of the audio data in the television 300 has been adjusted to be decreased in order to decrease the audio delay amount by visually recognizing the foregoing notification video and listening to the audio guide.

(4) When the conversation mode is selected, the video/audio processing circuit 307 of FIG. 14 may not synchronize the timing at which the video data after the video adjustment processing is output to the D/A converter 303 and the timing at which the audio data is output to the D/A converter 304 in the second embodiment.

In this case, when the synthesized video data E and the received audio data Db are input from the communication device 100 to the video/audio processing circuit 307, the received audio data Db is output to the D/A converter 304 without being affected by the video adjustment processing. Thus, the audio delay amount in the television 300 can sufficiently be decreased. As a result, the input/output delay amount is sufficiently decreased.

(5) The delay amount detector 121 c of FIG. 16 may apply the detected input/output delay amount to the controller 129 in the second embodiment. In this case, the controller 129 may determine whether or not the input/output delay amount applied from the delay amount detector 121 c exceeds the delay amount threshold value instead that the delay amount detector 121 c determines whether or not the input/output delay amount exceeds the delay amount threshold value.

(6) When the audio output from the speaker 302 of the television 300 is not input to the microphone 202 of the camera/microphone device 200, the echo cancellation processing unit 125 and the difference level detector 126 may not be provided in the control LSI 101 of FIG. 16 in the second embodiment.

(7) While the notification data M is the video data based on the notification video encouraging the user of the terminal 1000, for example, to perform the specific operations of the television 300 in the first and second embodiments, the present invention is not limited to this. The notification data M may be audio data based on audio encouraging the user of the terminal 1000, for example, to perform the specific operations of the television 300.

In this case, the notification data M is input to the television 300, for example, so that an audio guide saying “turn down audio volume”, “set display mode to conversation mode” or the like is output from the speaker 302.

[4] Correspondences Between Elements in the Claims and Parts in Embodiments

In the following paragraphs, non-limiting examples of correspondences between various elements recited in the claims below and those described above with respect to various preferred embodiments of the present invention are explained.

In the above-described embodiments, the communication device 100 is an example of a communication device, the personal computer 600, the television 700 and the mobile telephone 900 are examples of another device, the configuration including the microphone 202 and the A/D converter 204 is an example of an audio input device, and the configuration including the speaker 302, the audio volume adjuster 310, the D/A converter 304 and the video/audio processing circuit 307 is an example of an audio output device.

The network interface 103, the receiver 132 and the buffer 121 a are an example of a receiver, the HDMI 107 and the decoder 122 are an example an audio data output unit, the USB interface 105, the echo cancellation processing unit 125 and the delay amount detector 121 c are an example an audio data input unit, and the network interface 103, the packetizer 133 and the transmitter 134 are an example of a transmitter.

The echo cancellation processing unit 125 and the delay amount detector 121 c are examples of a difference detector, the difference level detector 126, the delay amount detector 121 c and the controller 129 are examples of a determiner, and the synthesizer 123, the notification signal producer 127 and the controller 129 are examples of a presentation signal producer.

The level difference threshold value TH1 and the delay amount threshold value are examples of an acceptable amount, the notification data M is an example of a presentation signal, the echo cancellation processing unit 125 is an example of a removal processing unit, the difference level detector 126 is an example of each of a level detector and a determiner, and the standard mode, the cinema mode, the dynamic mode and the conversation mode are examples of a plurality of display modes.

The configuration including the monitor 301, the D/A converter 303 and the video/audio processing circuit 307 is an example of a video output device, the HDMI 107 and the decoder 122 are examples of a video data output unit, the delay amount detector 121 c is an example of a delay detector, and the controller 129 is an example of a control signal producer.

As each of various elements recited in the claims, various other elements having configurations or functions described in the claims can be also used. 

1. A communication device capable of transmitting and receiving video data and audio data to and from another device and being connected to an audio input device and an audio output device, comprising: a receiver configured to be capable of receiving the audio data transmitted from said another device; an audio data output unit arranged to output the audio data received by said receiver to said audio output device; an audio data input unit to which audio data is input from said audio input device; a transmitter configured to be capable of transmitting the audio data input by said audio data input unit to said another device; a difference detector arranged to detect a geometric or temporal difference between a waveform represented by the audio data received by said receiver and a waveform represented by the audio data input by said audio data input unit when audio is output by said audio output device based on the audio data output from said audio data output unit and audio data based on said output audio is input from said audio input device to said audio data input unit; a determiner arranged to determine whether or not the difference detected by said difference detector exceeds a predetermined acceptable amount; and a presentation signal producer arranged to produce a presentation signal for presenting to a user a request for change of an output condition of the audio based on the audio data output by said audio data output unit when said determiner determines that the difference exceeds said acceptable amount.
 2. The communication device according to claim 1, wherein said difference detector includes: a removal processing unit arranged to perform processing of removing components corresponding to the audio data received by said receiver from the audio data input to said audio data input unit; and a level detector arranged to detect a level of the audio data processed by said removal processing unit as said difference when the audio is output by said audio output device based on the audio data output from said audio data output unit and the audio data based on said output audio is input from said audio input device to said audio data input unit, and said presentation signal includes a request for adjustment of audio volume of said audio output device as the request for change of the output condition of said audio.
 3. The communication device according to claim 2, wherein said presentation signal includes a request for decrease of the audio volume of said audio output device as the request for change of the output condition of said audio.
 4. The communication device according to claim 2, which is capable of being connected to a video output device, further comprising a video data output unit arranged to output video data to said video output device, wherein said presentation signal producer produces video data for displaying the request for adjustment of the audio volume of said audio output device as said presentation signal, and said video data output unit outputs the presentation signal produced by said presentation signal producer to said video output device.
 5. The communication device according to claim 1, wherein said audio output device is configured to be capable of changing an amount of delay of the input audio data, said difference detector includes a delay detector arranged to detect the amount of delay of the audio data input to said audio data input unit with respect to the audio data received by said receiver as said difference when the audio is output by said audio output device based on the audio data output from said audio data output unit and the audio data based on said output audio is input from said audio input device to said audio data input unit, and said presentation signal includes a request for operation involving change of the amount of delay of the audio data in said audio output device as the request for change of the output condition of said audio.
 6. The communication device according to claim 5, which is capable of being connected to a video output device, wherein said video output device is configured to be capable of displaying video based on video data in a selected display mode of a plurality of display modes and is set such that an amount of delay of the video data differs according to the plurality of display modes, said audio output device is configured to adjust the amount of delay of the audio data such that the audio data is synchronized with the video data in said video output device, said receiver is configured to be capable of receiving the video data transmitted from said another device, said communication device further comprises a video data output unit arranged to output the video data received by said receiver, and said presentation signal includes a request for operation of changing the display mode of said video output device as the request for the operation involving change of the amount of delay of the audio data in said audio output device.
 7. The communication device according to claim 6, wherein said presentation signal producer produces video data for displaying the request for the operation of changing the display mode of said video output device as said presentation signal, and said video data output unit outputs the presentation signal produced by said presentation signal producer to said video output device.
 8. The communication device according to claim 5, further comprising: a delay unit arranged to delay the audio data received by said receiver; and a removal processing unit arranged to perform processing of removing components corresponding to the audio data delayed by said delay unit from the audio data input to said audio data input unit, wherein said transmitter is configured to transmit the audio data processed by said removal processing unit to said another device.
 9. A communication device capable of transmitting and receiving video data and audio data to and from another device and being connected to an audio input device and an audio output device, comprising: a receiver configured to be capable of receiving the audio data transmitted from said another device; an audio data output unit arranged to output the audio data received by said receiver to said audio output device; an audio data input unit to which audio data is input from said audio input device; a transmitter configured to be capable of transmitting the audio data input by said audio data input unit to said another device; a difference detector arranged to detect a geometric or temporal difference between a waveform represented by the audio data received by said receiver and a waveform represented by the audio data input by said audio data input unit when audio is output by said audio output device based on the audio data output from said audio data output unit and audio data based on said output audio is input from said audio input device to said audio data input unit; a determiner arranged to determine whether or not the difference detected by said difference detector exceeds a predetermined acceptable amount; and a control signal producer arranged to produce a control signal for changing an output condition of the audio based on the audio data output by said audio data output unit when said determiner determines that the difference exceeds said acceptable amount.
 10. A communication method using a communication device capable of transmitting and receiving video data and audio data to and from another device and being connected to an audio input device and an audio output device, comprising the steps of: receiving the audio data transmitted from said another device using a receiver of said communication device; outputting said received audio data from an audio data output unit of said communication device to said audio output device; inputting audio data from said audio input device to an audio data input unit of said communication device; transmitting the audio data input to said audio data input unit from a transmitter of said communication device to said another device; detecting a geometric or temporal difference between a waveform represented by the audio data received by said receiver and a waveform represented by the audio data input by said audio data input unit when audio is output by said audio output device based on the audio data output from said audio data output unit and audio data based on said output audio is input from said audio input device to said audio data input unit; determining whether or not said detected difference exceeds a predetermined acceptable amount; and outputting a presentation signal for presenting to a user a request for change of an output condition of the audio based on the audio data output by said audio data output unit when it is determined that said difference exceeds said acceptable amount. 